Skip to content

feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle#4118

Merged
d-cs merged 15 commits into
mainfrom
runops/pr06-write-path
Jul 3, 2026
Merged

feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle#4118
d-cs merged 15 commits into
mainfrom
runops/pr06-write-path

Conversation

@d-cs

@d-cs d-cs commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator

What

Routes the webapp write path through the run-ops split seam: trigger/batch minting, idempotency-key resolution, and the run-lifecycle services now determine residency and dispatch writes to the correct store.

  • Trigger & batch (runEngine/services/triggerTask.server.ts, batchTrigger.server.ts, createBatch.server.ts, streamBatchItems.server.ts, v3/services/batchTriggerV3.server.ts): mint ids with the run-ops-aware minting and route creation/streaming through the store; batch children inherit the parent's residency.
  • Idempotency (runEngine/concerns/idempotencyKeys.server.ts + new idempotencyResidency.server.ts): idempotency-key lookup/dedup is residency-aware so a keyed retrigger resolves against the store that owns the original run.
  • Run lifecycle services (createCheckpoint, createTaskRunAttempt, enqueueDelayedRun, expireEnqueuedRun, finalizeTaskRun, resumeBatchRun, cancelDevSessionRuns, executeTasksWaitingForDeploy, triggerFailedTask): resolve their target run through the store rather than a fixed client.
  • Reads that fan out from writes (runsRepository + clickhouseRunsRepository, BulkActionV2 + batch read-through, realtime sessions/runReader, alerts deliverAlert/performTaskRunAlerts): route through the read-through resolver.
  • 9535ae63d — resolves the parent run through an injectable run store in TriggerFailedTaskService.
  • bf8f7c881 — drops the "known-migrated" concept from write-path and read repos; residency is id-shape only.
  • 515b897ea — self-defaults resolveWaitpointThroughReadThrough to the safe run-ops clients.

Why

PR6 of the run-ops split stack. This is the write-path counterpart to the read foundation in the previous PRs: with it in place, both reads and writes route through the seam. Additive when the split is disabled (id-shape resolution collapses to the control-plane client); behavior-changing on the minting, idempotency, and lifecycle paths when enabled.

Tests

Large new/expanded vitest suite under apps/webapp/test/ and colocated service tests: trigger-task and batch-trigger store routing, residency inheritance, idempotency dedup residency + legacy-authority, bulk-action read routing, cancel-dev-session routing, alerts store routing, runs-repository read-through, realtime session/run-reader read-through and stream-registration routing, and the waitpoint read-through default. Testcontainers-backed; no mocks.

Notes

Draft, stacked on #4117 (runops/pr05-webapp-foundation). Review that first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with Claude Code

@changeset-bot

changeset-bot Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: ea22f52

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Walkthrough

This PR migrates webapp write and read paths for run triggering, batch processing, idempotency-key resolution, and alert/session serialization from direct Prisma access to a RunStore/ControlPlaneResolver abstraction supporting split reads across legacy and new (run-ops) Postgres databases. It adds mint-kind-aware friendly-ID generation (KSUID vs CUID) for runs, batches, and failed-task creation to preserve residency inheritance across parent/child runs. New helpers (resolveIdempotencyDedupClient, resolveWaitpointThroughReadThrough, hydrateRunsAcrossSeam) route reads/writes based on split-mode configuration and ID classification. Numerous services (trigger, batch trigger, alerts, sessions, runs repository, cancel/resume flows) are updated accordingly, alongside extensive new integration tests using heterogeneous Postgres containers to validate store routing.

Changes

Cohort: RunStore/control-plane seam migration

  • Idempotency dedup client resolution: new resolveIdempotencyDedupClient helper and wiring into IdempotencyKeyConcern
  • Waitpoint read-through resolver: new resolveWaitpointThroughReadThrough helper
  • Trigger task/failed-task friendly-id minting with mint-kind resolution and residency inheritance
  • Batch trigger services (RunEngineBatchTriggerService, CreateBatchService, StreamBatchItemsService, BatchTriggerV3Service) migrated to runStore/mintBatchFriendlyId
  • Control-plane resolver adoption across run lifecycle services (task run attempts, enqueue/expire, finalize, alerts)
  • Run reading/session/bulk-action seam adoption (runsRepository, ClickHouse hydration, sessions, cancel/resume flows)
  • Documentation note added describing the run-ops write-path routing change
  • Minor findLatestSession signature update to accept an injectable Prisma client

Sequence Diagram(s)

sequenceDiagram
  participant Caller
  participant TriggerTaskService
  participant ResolveMintKind
  participant RunStore
  participant PrismaNew
  participant PrismaLegacy

  Caller->>TriggerTaskService: call(triggerRequest)
  TriggerTaskService->>ResolveMintKind: resolveRunIdMintKind / resolveInheritedMintKind
  ResolveMintKind-->>TriggerTaskService: mintKind (ksuid | cuid)
  TriggerTaskService->>TriggerTaskService: mintRunFriendlyId(environment, parentRunFriendlyId)
  TriggerTaskService->>RunStore: findRun(parentRunId) [if parentRunId provided]
  RunStore->>PrismaNew: query (if NEW residency)
  RunStore->>PrismaLegacy: query (if LEGACY residency)
  RunStore-->>TriggerTaskService: parentRun | null
  TriggerTaskService-->>Caller: created run (friendlyId, depth, parentTaskRunId)
Loading

Compact Metadata

Estimated code review effort: 5/5 (very high — 60+ files touched, high-risk data-routing logic, extensive but distributed test coverage)
Related PRs: None specified
Suggested labels: run-engine, database, run-ops-split, needs-careful-review
Suggested reviewers: Based on the scope of Prisma/RunStore seam changes across trigger, batch, and alert services, reviewers familiar with run-engine internals and database migration strategy should review this PR.

Poem:
A rabbit hops between two dens, PG14 and PG17 too,
Minting KSUIDs and CUIDs anew, deciding which store gets the queue.
Idempotency keys now dedupe with care,
Batches seal and runs find their lair.
Through the seam we hop, split but never lost—
Every run finds its home, whatever the cost. 🐰

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description covers scope, rationale, and tests, but it does not follow the required template or include Closes #, checklist, changelog, or screenshots sections. Add the missing template sections, especially Closes #, checklist items, Testing, Changelog, and Screenshots.
✅ Passed checks (4 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Title check ✅ Passed The title clearly summarizes the main change: routing the webapp write path through run-ops for minting, idempotency, and lifecycle flows.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch runops/pr06-write-path

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

devin-ai-integration[bot]

This comment was marked as resolved.

@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 413a945 to 99643f8 Compare July 2, 2026 18:02
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 515b897 to cb97148 Compare July 2, 2026 18:02
@pkg-pr-new

pkg-pr-new Bot commented Jul 2, 2026

Copy link
Copy Markdown

Open in StackBlitz

@trigger.dev/build

npm i https://pkg.pr.new/@trigger.dev/build@ea22f52

trigger.dev

npm i https://pkg.pr.new/trigger.dev@ea22f52

@trigger.dev/core

npm i https://pkg.pr.new/@trigger.dev/core@ea22f52

@trigger.dev/python

npm i https://pkg.pr.new/@trigger.dev/python@ea22f52

@trigger.dev/react-hooks

npm i https://pkg.pr.new/@trigger.dev/react-hooks@ea22f52

@trigger.dev/redis-worker

npm i https://pkg.pr.new/@trigger.dev/redis-worker@ea22f52

@trigger.dev/rsc

npm i https://pkg.pr.new/@trigger.dev/rsc@ea22f52

@trigger.dev/schema-to-json

npm i https://pkg.pr.new/@trigger.dev/schema-to-json@ea22f52

@trigger.dev/sdk

npm i https://pkg.pr.new/@trigger.dev/sdk@ea22f52

commit: ea22f52

coderabbitai[bot]

This comment was marked as resolved.

@d-cs

d-cs commented Jul 2, 2026

Copy link
Copy Markdown
Collaborator Author

Addressed the outside-diff note on idempotencyKeys.server.ts (blockRunWithWaitpoint writing via this.prisma): the block now passes the residency-resolved dedupClient as tx, so the idempotent parent run's waitpoint write lands on the store that owns that parent run rather than the fallback client.

@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 26871d5 to cdc4eb9 Compare July 2, 2026 19:25
@d-cs d-cs force-pushed the runops/pr06-write-path branch from c59d9c5 to d5d7fa1 Compare July 2, 2026 19:25
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from cdc4eb9 to e0b35d5 Compare July 2, 2026 20:21
@d-cs d-cs force-pushed the runops/pr06-write-path branch 3 times, most recently from 0db90f0 to d5415e8 Compare July 2, 2026 21:44
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 8024e36 to f9b9b0b Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr06-write-path branch from aa55b6b to 3153bc4 Compare July 3, 2026 08:51
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from f9b9b0b to 0937b15 Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 3153bc4 to d561590 Compare July 3, 2026 10:02
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 0937b15 to 729daf1 Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr06-write-path branch from d561590 to 9e7c367 Compare July 3, 2026 10:36
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from 729daf1 to bd6fc79 Compare July 3, 2026 10:44
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 9e7c367 to e23432d Compare July 3, 2026 10:44
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from bd6fc79 to a7e0846 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr06-write-path branch from e23432d to 8dff8b2 Compare July 3, 2026 11:08
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from a7e0846 to 4119616 Compare July 3, 2026 12:08
@d-cs d-cs force-pushed the runops/pr06-write-path branch 2 times, most recently from 891d81a to 5140cbc Compare July 3, 2026 15:42
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from d087c25 to b554794 Compare July 3, 2026 16:33
@d-cs d-cs force-pushed the runops/pr05-webapp-foundation branch from b554794 to 071cdc1 Compare July 3, 2026 16:44
@d-cs d-cs force-pushed the runops/pr06-write-path branch from f8f3096 to 4bda37a Compare July 3, 2026 16:44
Base automatically changed from runops/pr05-webapp-foundation to main July 3, 2026 17:02
d-cs and others added 15 commits July 3, 2026 18:03
… routing, run lifecycle

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e in TriggerFailedTaskService

TriggerFailedTaskService read the parent run via the ambient module-singleton
store while the engine wrote the run through its own store, so a ksuid parent's
row was not found and parentTaskRunId came back null. Add an optional injected
runStore (defaults to the shared singleton, preserving production behaviour) and
resolve the parent through it at both call sites, mirroring triggerTask.server.ts.

Align the three affected webapp tests to read through the same store the engine
wrote to: triggerFailedTask.test.ts passes engine.runStore; performTaskRunAlerts
routing passes a passthrough store over the seeded container; triggerTask.test.ts
stubs the run-ops db handles and pins split mode off so the idempotency dedup uses
the container client.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…id-shape only

Migration is deferred, so child/batch residency is a pure id-shape check.
Remove the isKnownMigrated (and mint-only isSplitEnabled) deps from the mint
sites (triggerTask, triggerFailedTask, batchTriggerV3) and call the now-
synchronous resolveInheritedMintKind(parentFriendlyId) with no deps arg.

Read paths: drop the isKnownMigrated re-probe-avoidance from the ClickHouse
runs hydrate (probe all missing on legacy), the runsRepository readThrough
options type, resolveWaitpointThroughReadThrough deps, and the BulkActionV2
batch seam adapter — keeping the genuine cross-seam fallback that reads NEW
first for unclassifiable/legacy-candidate ids.

Delete the injected-marker test cases; the remaining residency tests assert
pure id-shape inheritance.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s and test names

Review hygiene only: remove the NEW-1 label, Test X: name prefixes, and
[TEST-NEWSEED] comment label. No product logic or test behavior changed.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…o safe run-ops clients

The read-through concern defaulted both newClient and legacyReplica to $replica
(control-plane), so a bare caller that omits `deps` — the waitpoints wait route —
never queried the dedicated run-ops replica. A co-located, NEW-resident waitpoint
minted by streams.input().wait() lives on the run-ops-new DB, so the read missed,
returned null, and the route 404'd (re-serialized to 500).

Match the deps the complete/callback routes pass: default newClient to
runOpsNewReplica, legacyReplica to $replica, and splitEnabled to
runOpsSplitReadEnabled — mirroring readThroughRun's own self-defaulting. This
immunizes any bare caller (present or future) against the control-plane pin,
without touching the wait route. The wait/complete/callback call sites live on a
higher branch and are unchanged; complete/callback keep their explicit deps
(now redundant but harmless).

Adds a heteroRunOps regression case driving the concern with no `deps` via the
`defaults` DI seam: proves the old $replica default misses a NEW-resident
waitpoint (null) while the safe run-ops default finds it. No mocks; the fallback
is exercised against real PG14/PG17 containers.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rvice prisma to the resolved store

- Block the idempotent parent run's waitpoint via the residency-resolved
  dedup client instead of the fallback prisma, so the write lands on the
  store that owns the parent run.
- Pass the caller-provided _prisma into WithRunEngine so a custom store
  isn't silently overridden by the module singleton.
- Throw when a run-backed alert's environment can't be resolved instead of
  marking it SENT, so a transient replica miss doesn't permanently suppress
  the alert.
- Pin splitEnabled:false in the waitpoint passthrough test so it exercises
  single-DB behaviour rather than relying on ksuid residency.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The write-path split added static `runOpsLegacyPrisma`/`runOpsNewPrisma`
imports to idempotencyKeys.server.ts, which this test loads. vitest
validates every named import against the `~/db.server` mock, so the mock
now errored on the missing run-ops singletons. Add the four run-ops
exports (empty stubs, same boundary pattern as the batchTriggerV3
residency test) and pin isSplitEnabled() to false so the dedup routing
deterministically returns the injected fake prisma regardless of the
ambient RUN_OPS_SPLIT_ENABLED.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…setup

Worker/engine/marqs/pubsub/socket singletons each construct an ioredis
client at import time (singleton() + no lazyConnect), so any test importing
the service graph opened real Redis connections on import. In CI there is no
Redis, so these accumulate infinite-retry clients across a shard and take
the suite down (locally they pass only because dev Redis is up).

Globally mock the eager-Redis modules to no-op stubs in test/setup.ts:
commonWorker, batchTriggerWorker, legacyRunEngineWorker, alertsWorker,
the RunEngine and MarQS singletons, devPubSub and the socket.io server.
Only these singletons are mocked — never the run store (~/v3/runStore.server,
~/db.server), which store-routing/residency tests need real against
testcontainer Postgres.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…yConnect + stub runtime Redis singletons

The setup-file mocks of the six eager worker/engine singletons were not
enough: CI shards still flooded ECONNREFUSED/maxRetries. Two further
classes of env-Redis usage survived them, reproduced locally by running
the failing shards with REDIS_PORT pointed at a dead port:

1. Import-time construction: ~15 more singletons (platform cache,
   billing-limit reconcile queue, alerts rate limiter, DevPresence,
   auto-increment counter, s2 token cache, v1 streams cache, ...)
   build ioredis clients at module import, and ioredis dials on
   construction. A global ioredis mock now forces lazyConnect: true so
   clients only dial on first command — testcontainer-backed tests are
   unaffected (their first command connects as before).

2. Runtime commands inside code under test: tracePubSub.publish()
   (eventRepository writes), alertsRateLimiter.check() (deliverAlert)
   and the task metadata cache each issue commands against
   env-configured Redis mid-test; every command burns ~20 reconnect
   cycles before its error surfaces, which times the tests out. These
   three modules are now stubbed (metadata cache pinned to its Noop
   implementation, which is what CI's unset env resolves to anyway).

Verified: webapp shards 2/5/6/8 (the ones failing on the pr06+ stack)
run green with Redis pointed at a dead port, and shards 2/8 stay green
against live Redis (store-routing suites still exercise the real run
store).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…in CI

CI runners have no .env, no REDIS_HOST/REDIS_PORT, and no Postgres at
localhost:5432, which surfaced two failure layers that local runs mask
(the dev stack answers on both):

- suites transitively importing triggerTaskV1.server failed to collect
  because autoIncrementCounter.server.ts throws at import when
  REDIS_HOST/REDIS_PORT are unset (shards 2/5/6). Default the pair in
  test/setup.ts — the global ioredis lazyConnect mock means nothing dials.
- TriggerFailedTaskService.call() resolved its event repository via
  getEventRepository → global prisma (feature-flag read + Prisma event
  repo), so in CI the swallowed connect error returned null friendlyIds
  (shard 8). Allow injecting the repository/store pair and bind the test
  to an EventRepository over the testcontainer DB.
- once the cancelDevSessionRuns suite could collect, findLatestSession's
  hardwired global $replica was the next masked layer; give it an
  injectable client (defaulting to $replica) and pass the service's
  _replica through.

Verified by replaying the exact CI env locally (.env hidden, workflow env
vars, dead localhost DB, GITHUB_ACTIONS set): all four failing suites and
full shards 2/5/6/8 reproduce the CI failures before and pass after.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…method access

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…am regressions

Two regression tests for the write-path read seams:
- runsRepository: paginating the full keyset over interleaved cuid/ksuid runs
  enumerates every id once, no empty page, in ClickHouse (created_at DESC, run_id
  DESC) order -- fails if hydration reverts to lexical id desc across the id-space
  seam.
- runReader: a NEW-resident (ksuid) run's terminal metadata hydrates through the
  owning store, never a generic legacy replica.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@d-cs d-cs force-pushed the runops/pr06-write-path branch from 4bda37a to ea22f52 Compare July 3, 2026 17:08
@d-cs d-cs marked this pull request as ready for review July 3, 2026 17:08

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 24f52900-ef0f-43d1-973c-1bc57696ef16

📥 Commits

Reviewing files that changed from the base of the PR and between 515b897 and ea22f52.

📒 Files selected for processing (2)
  • .server-changes/run-ops-split-webapp-write-path.md
  • apps/webapp/app/models/runtimeEnvironment.server.ts
📜 Review details
⏰ Context from checks skipped due to timeout. (11)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (4, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (7, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (10, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (9, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (6, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (1, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (5, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (8, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (2, 10)
  • GitHub Check: webapp / 🧪 Unit Tests: Webapp (3, 10)
  • GitHub Check: typecheck / typecheck
⚠️ CI failures not shown inline (4)

GitHub Actions: 📝 CLAUDE.md Audit / audit: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 25
--model claude-opus-4-8
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are reviewing a PR to check whether any CLAUDE.md files or .claude/rules/ files need updating.
## Your task
1. Run `git diff origin/main...HEAD --name-only` to see which files changed in this PR.
2. For each changed directory, check if there's a CLAUDE.md in that directory or a parent directory.
3. Determine if any CLAUDE.md or .claude/rules/ file should be updated based on the changes. Consider:
   - New files/directories that aren't covered by existing documentation
   - Changed architecture or patterns that contradict current CLAUDE.md guidance
   - New dependencies, services, or infrastructure that Claude should know about
   - Renamed or moved files that are referenced in CLAUDE.md
   - Changes to build commands, test patterns, or development workflows
## Response format
If NO updates are needed, respond with exactly:
✅ CLAUDE.md files look current for this PR.
If updates ARE needed, respond with a short list:
📝 **CLAUDE.md updates suggested:**
- `path/to/CLAUDE.md`: [what should be added/changed]
- `.claude/rules/file.md`: [what should be added/changed]
Keep suggestions specific and brief. Only flag things that would actually mislead Claude in future sessions.
Do NOT suggest updates for trivial changes (bug fixes, small refactors within existing patterns).
Do NOT suggest creating new CLAUDE.md files - only updates to existing ones.
   trigger_phrase: `@claude`
   label_trigger: claude
   branch_prefix: claude/
   use_bedrock: false
   use_vertex: false
   use_foundry: false
   classify_inline_comments: true
   use_commit_signing: false
   bot_id: 41898282
   bot_name: claude[bot]
   track_progress: false
   include_fix_links: true
   display_report: false...

GitHub Actions: 📝 CLAUDE.md Audit / 0_audit.txt: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle

Conclusion: failure

View job details

alt@3.0.0-beta.46 -> `@trigger.dev/yalt`@3.0.0-beta.46
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.47 -> `@trigger.dev/yalt`@3.0.0-beta.47
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.48 -> `@trigger.dev/yalt`@3.0.0-beta.48
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.49 -> `@trigger.dev/yalt`@3.0.0-beta.49
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.5 -> `@trigger.dev/yalt`@3.0.0-beta.5
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.50 -> `@trigger.dev/yalt`@3.0.0-beta.50
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.51 -> `@trigger.dev/yalt`@3.0.0-beta.51
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.52 -> `@trigger.dev/yalt`@3.0.0-beta.52
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.53 -> `@trigger.dev/yalt`@3.0.0-beta.53
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.55 -> `@trigger.dev/yalt`@3.0.0-beta.55
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.6 -> `@trigger.dev/yalt`@3.0.0-beta.6
  * [new tag]             `@trigger.dev/yalt`@3.0.0-beta.7 -> `@trigger.dev/yalt`@3.0.0-beta.7
  * [new tag]             build-alert-hotfix.rc1      -> build-alert-hotfix.rc1
  * [new tag]             build-alert-hotfix.rc2      -> build-alert-hotfix.rc2
  * [new tag]             build-arm-builds-rc.1       -> build-arm-builds-rc.1
  * [new tag]             build-arm-builds-rc.2       -> build-arm-builds-rc.2
  * [new tag]             build-arm-builds-rc.3       -> build-arm-builds-rc.3
  * [new tag]             build-batchid-carryover-rc.0 -> build-batchid-carryover-rc.0
  * [new tag]             build-batching-rc.1         -> build-batching-rc.1
  * [new tag]             build-batching-rc.2         -> build-batching-rc.2
  * [new tag]             build-billing-0.0.1         -> build-billing-0.0.1
  * [new tag]             build-billing-0.0.2         -> build-billing-0.0.2
  * [new tag]             build-billing-0.0.3         -> build-billing-0.0.3
  * [new tag]             build-buildinfo-rc.0        -> b...

GitHub Actions: 🔎 REVIEW.md Drift Audit / audit: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle

Conclusion: failure

View job details

##[group]Run anthropics/claude-code-action@428971d2ecd6e3a7cb0ee0da2a3a8b33fdb3678d
 with:
   anthropic_***REDACTED***
   use_sticky_comment: true
   allowed_bots: devin-ai-integration[bot]
   claude_args: --max-turns 30
--allowedTools "Read,Glob,Grep,Bash(git diff:*)"
   prompt: You are auditing this PR for drift against `.claude/REVIEW.md`.
## Context
`.claude/REVIEW.md` is the repo's source of truth for what AI / agent code reviewers should treat as critical findings (rolling-deploy safety, hot-table indexes, recovery-path queries, testcontainers usage, Lua versioning, etc.). It is consumed by review agents to calibrate severity. If REVIEW.md goes stale, every future agent review degrades.
## Strategy — read this first
You have a hard turn budget. Spend it on signal, not coverage. The audit is allowed to miss things; it is NOT allowed to time out.
1. Read `.claude/REVIEW.md` once, in full.
2. Run `git diff origin/main...HEAD --name-only` to get the list of changed files. Do NOT read the diff content yet.
3. Scan the file-list for relevance to REVIEW.md scope. Relevance signals: changes to Prisma schema, Redis / queue / Lua code, hot tables, recovery / restart loops, new packages, deletions of paths REVIEW.md cites. Skim everything else.
4. Open at most **5 files** total — only the ones most likely to surface a real signal. If nothing in the file-list looks relevant to any REVIEW.md rule, do NOT read any files; go straight to the verdict.
5. Form a verdict and stop. Do not exhaust the turn budget exploring.
Large PRs (>50 files changed) are a strong signal to be MORE selective, not more thorough. Pick 3-5 files at most.
## What to look for
- **Stale references** — does any REVIEW.md rule cite a file, directory, function, table, Prisma model, or package name that has been removed or renamed in this PR (or is already gone from `main`)?
- **Contradictions** — does code in this PR clearly violate a current REVIEW.md rule? (Don't re-review the PR. Only flag if REVIE...

GitHub Actions: 🔎 REVIEW.md Drift Audit / 0_audit.txt: feat(run-ops): webapp write path — trigger/batch minting, idempotency routing, run lifecycle

Conclusion: failure

View job details

y-run-engine.fix3 -> build-legacy-run-engine.fix3
  * [new tag]             build-manual-checkpoints.rc1 -> build-manual-checkpoints.rc1
  * [new tag]             build-metadata-upgrade-logging.rc1 -> build-metadata-upgrade-logging.rc1
  * [new tag]             build-metadata-upgrade-logging.rc2 -> build-metadata-upgrade-logging.rc2
  * [new tag]             build-metadata-upgrade-logging.rc3 -> build-metadata-upgrade-logging.rc3
  * [new tag]             build-new-build-system.rc.1 -> build-new-build-system.rc.1
  * [new tag]             build-otel-upgrade-rc.0     -> build-otel-upgrade-rc.0
  * [new tag]             build-otel-upgrade-rc.1     -> build-otel-upgrade-rc.1
  * [new tag]             build-pre-pull-deployments-rc.1 -> build-pre-pull-deployments-rc.1
  * [new tag]             build-prod-rescue-rc.1      -> build-prod-rescue-rc.1
  * [new tag]             build-rate-limiter-fix-rc.1 -> build-rate-limiter-fix-rc.1
  * [new tag]             build-re2.rc0               -> build-re2.rc0
  * [new tag]             build-realtime-v2-stream-fix -> build-realtime-v2-stream-fix
  * [new tag]             build-realtime-v2-stream-fix-2 -> build-realtime-v2-stream-fix-2
  * [new tag]             build-realtime-v2-stream-fix-3 -> build-realtime-v2-stream-fix-3
  * [new tag]             build-realtime-v2-stream-fix-4 -> build-realtime-v2-stream-fix-4
  * [new tag]             build-realtime-v2-stream-fix-5 -> build-realtime-v2-stream-fix-5
  * [new tag]             build-realtimestreams-dedupe -> build-realtimestreams-dedupe
  * [new tag]             build-registry-maintenance-rc.1 -> build-registry-maintenance-rc.1
  * [new tag]             build-registry-maintenance-rc.2 -> build-registry-maintenance-rc.2
  * [new tag]             build-remote-ecr-rc.0       -> build-remote-ecr-rc.0
  * [new tag]             build-reschedule-hotfix.rc1 -> build-reschedule-hotfix.rc1
  * [new tag]             build-resume-fixes.rc1      -> build-resume-fixes.rc1
  * [new tag]      ...
🧰 Additional context used
📓 Path-based instructions (8)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
apps/webapp/**/*.server.ts

📄 CodeRabbit inference engine (apps/webapp/CLAUDE.md)

apps/webapp/**/*.server.ts: Never use request.signal for detecting client disconnects. Use getRequestAbortSignal() from app/services/httpAsyncStorage.server.ts instead, which is wired directly to Express res.on('close') and fires reliably
Access environment variables via env export from app/env.server.ts. Never use process.env directly
Always use findFirst instead of findUnique in Prisma queries. findUnique has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). findFirst is never batched and avoids this entire class of issues

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}

📄 CodeRabbit inference engine (CLAUDE.md)

**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs}: Use pnpm run typecheck for changes in apps (apps/*) and internal packages (internal-packages/*), and never use build to verify those changes.
Use Vitest for tests, and never mock anything; use testcontainers instead.
Prefer static imports over dynamic import(), and only use dynamic imports for unresolved circular dependencies, genuine code-splitting needs, or conditional runtime loading.

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
**/*.{ts,tsx,js,jsx,mts,cts,mjs,cjs,md,mdx}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks; never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
🧠 Learnings (14)
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma error P1001 ("Can't reach database server") in TypeScript, don’t assume a single error shape. Prisma can surface P1001 via two different error classes/fields: `PrismaClientKnownRequestError` exposes it as `err.code === "P1001"` (common during mid-query connection drops), while `PrismaClientInitializationError` exposes it as `err.errorCode === "P1001"` (common on client startup failure). Therefore, predicates should use `err.code === "P1001" || err.errorCode === "P1001"`. Do not flag `err.code === "P1001"` as “unreachable/never matches,” as it is expected in production.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-18T08:21:27.694Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3632
File: apps/webapp/sentry.server.ts:4-21
Timestamp: 2026-05-18T08:21:27.694Z
Learning: When handling Prisma errors for P1001 ("Can't reach database server"), do not assume it only appears under a single property name. Prisma may surface P1001 via either `PrismaClientKnownRequestError` (`err.code === "P1001"`, e.g., mid-query connection drops) or `PrismaClientInitializationError` (`err.errorCode === "P1001"`, e.g., client startup connection failure). To reliably detect the condition, check `err.code === "P1001" || err.errorCode === "P1001"`, and avoid review rules that would incorrectly flag `err.code === "P1001"` as unreachable/never-matching.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-13T19:53:13.759Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3937
File: packages/trigger-sdk/skills/realtime-and-frontend/SKILL.md:258-260
Timestamp: 2026-06-13T19:53:13.759Z
Learning: When reviewing code that uses `trigger.dev/react-hooks`’s `useRealtimeRun`, preserve the call signature where the first argument is the full realtime handle object (not `handle.id`). This is intentional to maintain type-safety and is consistent with the official docs; do not suggest changing the first argument from the handle object to `handle.id`.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-17T17:13:49.929Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3948
File: apps/webapp/app/routes/_app.orgs.$organizationSlug.projects.$projectParam.env.$envParam.bulk-actions.$bulkActionParam/route.tsx:48-62
Timestamp: 2026-06-17T17:13:49.929Z
Learning: In triggerdotdev/trigger.dev, within `dashboardLoader`/`dashboardAction` (or similar context resolver code) whenever you resolve an organization ID from an organization slug for RBAC/enterprise authorization scope, always read from the primary Prisma client (`prisma`), not `$replica`. Using `$replica` can hit replica-lag and cause the RBAC lookup/authorization to run without the correct org scope (bypassing intended role enforcement). Implement the slug→org lookup with `prisma.organization.findFirst(...)` (or equivalent primary-client query) and add an inline comment documenting why the primary client is required (replica lag could lead to unscoped RBAC checks).

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-23T13:04:21.413Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4023
File: apps/webapp/app/services/upsertBranch.server.ts:14-18
Timestamp: 2026-06-23T13:04:21.413Z
Learning: In TypeScript, it’s valid to `import { type X }` and then use `typeof X` in a type-only position, e.g. `type Alias = z.infer<typeof X>`. The `type` modifier suppresses the runtime import, but the type checker still has the full exported type so `z.infer<typeof X>` can resolve correctly. In code reviews, don’t flag this as a TypeScript compile error as long as `typeof X` is used in a type context (e.g., with `z.infer`, `type` aliases, generics), not as a runtime value.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-05T09:38:02.512Z
Learnt from: d-cs
Repo: triggerdotdev/trigger.dev PR: 3523
File: apps/webapp/app/routes/api.v3.batches.ts:178-181
Timestamp: 2026-05-05T09:38:02.512Z
Learning: When reviewing code that catches `ServiceValidationError` in `*.server.ts` files, do not blindly forward `error.status` to HTTP responses, because SVEs may be thrown with non-default statuses (e.g., 400/500) and forwarding them can cause client-visible behavioral regressions (e.g., surfacing 500s to clients). Prefer a safe default response status of `error.status ?? 422`, but only after confirming via the reachable call graph that the caught `ServiceValidationError` instances are expected to carry those non-default statuses; otherwise, normalize to `422` to avoid unexpected client-visible 5xx behavior.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-12T21:04:05.815Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3542
File: apps/webapp/app/components/sessions/v1/SessionStatus.tsx:1-3
Timestamp: 2026-05-12T21:04:05.815Z
Learning: In this Remix + TypeScript codebase, do not flag a server/client boundary violation when a file imports only types from a module matching `*.server`.

Specifically, it’s safe to import types using `import type { Foo } from "*.server"` or `import { type Foo } from "*.server"` because TypeScript erases type-only imports at compile time and they emit no JavaScript, so they won’t cross the Remix server/client bundle boundary.

Only raise the boundary concern for value imports (e.g., `import { Foo }` without `type`, or `import Foo`), since those produce JavaScript output.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-25T18:21:51.905Z
Learnt from: carderne
Repo: triggerdotdev/trigger.dev PR: 4039
File: apps/webapp/app/routes/invite-revoke.tsx:0-0
Timestamp: 2026-06-25T18:21:51.905Z
Learning: During the Zod v4 migration in the triggerdotdev/trigger.dev webapp, ensure any imports from `conform-to/zod` use the Zod-4 subpath: `conform-to/zod/v4` (e.g., `import { parseWithZod } from "conform-to/zod/v4"`). Do not import from the package root `conform-to/zod`, because it is the Zod 3 implementation and may load Zod-3-only symbols (e.g., `ZodBranded`, `ZodEffects`), which can throw at module load (notably with `zod4.4.3`). This should be enforced across `apps/webapp/**/*` where helpers like `parseWithZod` and `conformZodMessage` are used.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-07-03T17:10:21.498Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 4148
File: apps/webapp/app/models/orgMember.server.ts:149-168
Timestamp: 2026-07-03T17:10:21.498Z
Learning: In triggerdotdev/trigger.dev, `User.email` (Prisma schema: `internal-packages/database/prisma/schema.prisma`) currently does NOT use `citext` and does NOT have a `lower(email)` functional unique index. Therefore, do not introduce Prisma queries like `where: { email: { equals: <value>, mode: "insensitive" } }` (or any case-insensitive lookup) against `User.email`, because it can force sequential scans of the `users` table under load. During review, ensure email is normalized (e.g., lowercased/trimmed) before both writes and subsequent lookups, and if true case-insensitive behavior/uniqueness is required, implement it via a separate app-wide migration (e.g., switch to `citext` and/or add a functional unique index with backfill) rather than bolting it onto individual feature PRs.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-04T18:16:35.386Z
Learnt from: nicktrn
Repo: triggerdotdev/trigger.dev PR: 3836
File: apps/supervisor/src/backpressure/backpressureMonitor.ts:3-5
Timestamp: 2026-06-04T18:16:35.386Z
Learning: When reviewing TypeScript in this repo, apply the rule “prefer type aliases over interfaces” only to data/object shapes and union/intersection type modeling. If an interface is being used as a behavioral contract for collaborators to implement (e.g., method-shape interfaces that define required behavior, such as `BackpressureLogger` / `BackpressureSignalSource` in `apps/supervisor/src/backpressure/backpressureMonitor.ts`), keep it as an `interface` and do not flag it as a type-alias-vs-interface violation.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-06-09T17:58:04.699Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3879
File: apps/webapp/app/models/vercelIntegration.server.ts:619-630
Timestamp: 2026-06-09T17:58:04.699Z
Learning: In this codebase, outbound raw `fetch` calls should typically rely on Node/undici’s default request timeout (about ~300s) rather than adding a per-call `AbortController` + `setTimeout` wrapper inside individual functions (e.g. in files like `apps/webapp/app/models/vercelIntegration.server.ts`). During code review, do not flag the absence of a per-call timeout on a single `fetch` as an issue; if per-call timeouts are needed, they should be implemented via a codebase-wide convention (e.g., a shared fetch wrapper or documented pattern) rather than ad-hoc per-function changes.

Applied to files:

  • apps/webapp/app/models/runtimeEnvironment.server.ts
📚 Learning: 2026-05-14T14:54:39.095Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3545
File: .server-changes/agent-view-sessions.md:10-10
Timestamp: 2026-05-14T14:54:39.095Z
Learning: In the `trigger.dev` repository, do not flag inconsistent dot vs slash notation in route/path strings inside `.server-changes/*.md` files. These markdown files are consumed verbatim into the changelog, so the mixed notation (e.g., `resources.orgs.../runs.$runParam/...`) is intentional and should be preserved as-is.

Applied to files:

  • .server-changes/run-ops-split-webapp-write-path.md
🪛 LanguageTool
.server-changes/run-ops-split-webapp-write-path.md

[grammar] ~6-~6: Ensure spelling is correct
Context: ...ea: webapp type: feature --- Route the webapp write path — trigger/batch run minting,...

(QB_NEW_EN_ORTHOGRAPHY_ERROR_IDS_1)

🔇 Additional comments (2)
.server-changes/run-ops-split-webapp-write-path.md (1)

6-6: LGTM!

apps/webapp/app/models/runtimeEnvironment.server.ts (1)

361-375: LGTM!

@devin-ai-integration devin-ai-integration Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

Open in Devin Review

Comment thread apps/webapp/test/setup.ts

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Inline review comments failed to post. This is likely due to GitHub's internal server error or limits when posting large numbers of comments. If you are seeing this consistently it is likely a permissions issue. Please check "Moderation" -> "Code review limits" under your organization settings.

Actionable comments posted: 1


ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 24f52900-ef0f-43d1-973c-1bc57696ef16

📥 Commits

Reviewing files that changed from the base of the PR and between 515b897 and ea22f52.

📒 Files selected for processing (2)
  • .server-changes/run-ops-split-webapp-write-path.md
  • apps/webapp/app/models/runtimeEnvironment.server.ts
📜 Review details
🔇 Additional comments (2)
.server-changes/run-ops-split-webapp-write-path.md (1)

6-6: LGTM!

apps/webapp/app/models/runtimeEnvironment.server.ts (1)

361-375: LGTM!

🛑 Comments failed to post (1)
apps/webapp/app/models/runtimeEnvironment.server.ts (1)

5-5: 🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Inspect the resolver implementation to confirm tx-forwarding and return shape
fd -t f 'controlPlaneResolver.server.ts' -x cat -n {}

# Check downstream consumers of findEnvironmentFromRun for reliance on nested relations
rg -nP -A5 -B5 '\bfindEnvironmentFromRun\s*\(' --type=ts

Repository: triggerdotdev/trigger.dev

Length of output: 18773


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- apps/webapp/app/models/runtimeEnvironment.server.ts (relevant section) ---'
sed -n '240,340p' apps/webapp/app/models/runtimeEnvironment.server.ts

echo
echo '--- type definitions / exports in apps/webapp/app/models/runtimeEnvironment.server.ts ---'
rg -n "type EnvironmentFromRun|interface EnvironmentFromRun|export type EnvironmentFromRun|export async function findEnvironmentFromRun|resolveAuthenticatedEnv" apps/webapp/app/models/runtimeEnvironment.server.ts

echo
echo '--- downstream consumers of EnvironmentFromRun.environment ---'
rg -n -A3 -B3 "EnvironmentFromRun|environment\." apps/webapp/app/models -g '*.ts' -g '*.tsx'

echo
echo '--- callsites of findEnvironmentFromRun ---'
rg -n -A4 -B4 "\bfindEnvironmentFromRun\s*\(" apps/webapp -g '*.ts' -g '*.tsx'

Repository: triggerdotdev/trigger.dev

Length of output: 18779


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo '--- runEngineHandlers usage around findEnvironmentFromRun result ---'
sed -n '597,660p' apps/webapp/app/v3/runEngineHandlers.server.ts

echo
echo '--- authenticated environment shape helpers ---'
rg -n -A4 -B4 "type AuthenticatedEnvironment|authIncludeBase|authIncludeWithParent|toAuthenticated" apps/webapp/app/models/runtimeEnvironment.server.ts

Repository: triggerdotdev/trigger.dev

Length of output: 6348


Forward tx into the environment lookup. findRun() respects the caller’s transaction, but controlPlaneResolver.resolveAuthenticatedEnv() always uses its own client, so a passed-in tx can still read the environment from a different connection/replica and break read-after-write consistency.

@d-cs

d-cs commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Clarifying the test/setup.ts test-infra changes (for anyone reading the review note about "global mocks")

We are not globally mocking Redis. The ioredis handler is a thin subclass that only forces lazyConnect: true — it is the real ioredis client, it just doesn't open a socket at construction. Any test that issues a command (i.e. uses a real Redis testcontainer) connects and behaves exactly as in production.

What it actually fixes is a connection storm, not test correctness. ~20+ modules construct ioredis clients at import time (eager singletons: workers, cache stores, pub/sub, rate limiters, …). As the webapp suite has grown, importing the service graph across many test files × forked workers means all those eager clients dial simultaneously. CI has no Redis service, so that's a flood of ECONNREFUSED at shard scale that fails the suite before a single assertion runs. lazyConnect defers the dial to first use, so unused singletons stay quiet while real-Redis tests still connect and exercise real Redis.

Separately, a handful of pure-infra singletons (the workers, engine, marqs, socket.io, trace pub/sub, rate limiter, task-metadata cache) are stubbed to no-ops — but only ones no test exercises. Every test that needs real infra builds its own client from the testcontainer (redisOptions) and hits real Postgres/Redis; no product logic is mocked. (Verified: the real-Redis rate-limit tests still genuinely reject over-limit requests through this setup.)

@d-cs

d-cs commented Jul 3, 2026

Copy link
Copy Markdown
Collaborator Author

Follow-up (non-blocking): the cleaner long-term fix is to make those eager clients lazyConnect: true (or defer their construction) at the source, so importing a module never opens a Redis connection. Once that lands, this test-setup scaffold can be deleted entirely — the lazyConnect wrapper and the infra-singleton stubs only exist because construction currently has a side effect. Flagging it here so the scaffold isn't treated as load-bearing indefinitely.

@d-cs d-cs merged commit 84f3e1b into main Jul 3, 2026
40 checks passed
@d-cs d-cs deleted the runops/pr06-write-path branch July 3, 2026 17:52
d-cs added a commit that referenced this pull request Jul 3, 2026
…#4119)

## What

Extends the ClickHouse runs-replication service to fan in from multiple
Postgres sources (the control-plane DB and the run-ops DB) instead of a
single source, plus the admin operations to run and observe it.

- **Multi-source fan-in** (`services/runsReplicationService.server.ts`,
new `runsReplicationInstance.server.ts`,
`runsReplicationGlobal.server.ts`): factors the replication service into
per-source instances and a coordinator so a single ClickHouse target is
fed from more than one Postgres source.
- **Admin ops** (`routes/admin.api.v1.runs-replication.status.ts`,
`admin.api.v1.runs-replication.backfill.ts`,
`v3/services/adminWorker.server.ts`): adds a status endpoint reporting
per-source replication state and updates the backfill entrypoint for the
multi-source shape.

## Why

PR7 of the run-ops split stack, and the final piece: once run state can
live in a separate run-ops DB (earlier PRs), the analytics replication
into ClickHouse has to consume both sources so runs remain queryable
regardless of residency. Behavior-changing for the replication service
internals; the ClickHouse-facing output is unchanged (still one runs
stream), and single-source operation is preserved when the split is not
enabled.

## Tests

New vitest coverage: `runsReplicationInstance.test.ts` (per-source
instance behavior) and `runsReplicationService.part8`/`part9` suites
exercising the multi-source coordinator. Testcontainers-backed
(ClickHouse + Postgres); no mocks.

## Notes

Draft, **stacked on #4118** (`runops/pr06-write-path`). Review that
first; this diff is against it.

Server-change / changeset note to be added at stack-assembly time.

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants